Science Inventory

Identification of candidate reference chemicals using multidimensional literature and database mining with EPA’s PubMed Abstract Sifter

Citation:

Baker, Nancy C. AND T. Knudsen. Identification of candidate reference chemicals using multidimensional literature and database mining with EPA’s PubMed Abstract Sifter. American Society for Cellular and Computational Toxicology (ASCCT), Chapel Hill, NC, October 19 - 21, 2022. https://doi.org/10.23645/epacomptox.21950867

Impact/Purpose:

Poster presentation to the American Society for Cellular and Computational Toxicology (ASCCT) annual meeting October 2022, providing updates on a tool for high-throughput literature and database mining to support virtual tissue models and NAMs. This work also aligns with Tox21 cross-partner project CPP13.

Description:

Identifying collections of reference chemicals for calibrating in vitro assays and in silico models is an important task in toxicological research.  A strong set of reference chemicals can bolster confidence in the accuracy and predictive value of new approach methods (NAMs) for toxicological hazard evaluation. While a growing amount of data on chemical bioactivity is available for this purpose in structured databases such as EPA’s CompTox Chemicals Dashboard, EBI’s ChEMBL, or NIEHS/NTP’s Integrated Chemical Environment (ICE), the biomedical literature still contains a vast amount of unstructured data on biological pathways and processes underlying chemical toxicity.  Here we demonstrate methods of finding reference chemicals in the literature using the US EPA’s Abstract Sifter tool. The tool’s MeSHMine text-mining feature can be used to extract curated chemical information annotated in articles on various domains of toxicological concern, such as developmental and organ-specific toxicity. Automation of complex queries searches for available information on different biological systems and returns a rich list of candidate chemical compounds, as well as relevant proteins, metabolites and genes in the system. The extracted list can be filtered to reduce dimensionality and focus on candidate reference compound selection.  In parallel, tables of chemicals with information on bioactivity can be extracted from pdf files and incorporated in the Abstract Sifter. The chemical lists gathered in these two methods are further enhanced by the Application Programming Interface (API) retrieval of the chemical identifiers (DSSToxIDs) and structural identifiers from the US EPA’s Chemicals Dashboard and the chemical names can be used in subsequent subject area PubMed searches for mechanistic information. An integration of these literature mining methods streamline the search for candidate reference chemicals to inform NAMs and construct chemical sets for computational modeling. This abstract does not necessarily reflect US EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:10/21/2022
Record Last Revised:01/24/2023
OMB Category:Other
Record ID: 356884